rl beat sota reward-based rl
UC Berkeley Reward-Free RL Beats SOTA Reward-Based RL
End-to-end Deep Reinforcement Learning (DRL) is a trending training approach in the field of computer vision, where it has proven successful at solving a wide range of complex tasks that were previously regarded as out of reach. End-to-end DRL is now being applied in domains ranging from real-world and simulated robotics to sophisticated video games. However, as appealing as end-to-end DRL methods are, most rely heavily on reward functions in order to learn visual features. This means feature-learning suffers when rewards are sparse, which is the case in most real-world scenarios. ATC trains a convolutional encoder to associate pairs of observations separated by a short time difference. Random shift, a stochastic data augmentation to the observations is applied within each training batch.